CorAIt - A Non-native Speech Database for Italian

نویسنده

  • Claudia Roberta Combei
چکیده

English. CorAIt is a non-native speech database for Italian, which is freely accessible online for academic research purposes. It was especially designed to meet the requirements of a larger research project focused on foreign accented Italian speech. The corpus is aimed at providing a uniform collection of speech samples uttered by non-native speakers of Italian. To date, 105 non-native speakers – whose mother tongues are either French, Romanian, Spanish, English, German, or Russian – have been recorded. The corpus includes also a control group made up of 16 Italian speakers. There are almost 8 hours of audio material, both read speech (first and second reading), and spontaneous speech. This paper emphasizes the necessity for this type of database, it describes the steps involved in its construction, and it presents the features of CorAIt. Italiano. CorAIt è un corpus audio di l’italiano L2 liberamente consultabile online per scopi di ricerca scientifica. Il corpus è parte integrante di un progetto di ricerca che affronta l’accento straniero nella lingua italiana da una prospettiva più ampia. E’ stato ideato e costruito con lo scopo di fornire una raccolta uniforme di materiale audio prodotto da parlanti di italiano L2. Ad oggi sono stati registrati 105 parlanti stranieri di madrelingua: francese, romena, spagnola, inglese, tedesca, e russa. In aggiunta, il corpus è dotato di un gruppo di controllo composto da 16 parlanti italiani. Sono disponibili circa 8 ore di registrazioni, sia di parlato letto (prima e seconda lettura) che di parlato spontaneo. L’articolo evidenzia la necessità di costruire questo tipo di database, e descrive la progettazione e le caratteristiche di CorAIt.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Preliminary Investigations in Automatic Recognition of English Sentences Uttered by Italian Children

This paper reports on a initial research activity in the area of non-native children’s speech recognition that was carried out by exploiting two children databases, one consisting of speech collected from native English children, the other one consisting of English sentences read by Italian learners of English in the same age range of the native speakers. By exploiting the corpus of native spee...

متن کامل

Investigating automatic recognition of non-native children's speech

This paper presents an initial effort in the area of nonnative children’s speech recognition by exploiting two children databases, one consisting of speech collected from native English children, the other one consisting of English sentences read by Italian learners of English in the same age range of the native speakers. First, a baseline speech recognizer for British English was trained on th...

متن کامل

Multilingual non-native speech recognition using phonetic confusion-based acoustic model modification and graphemic constraints

In this paper we present an automated approach for non-native speech recognition. We introduce a new phonetic confusion concept that associates sequences of native language (NL) phones to spoken language (SL) phones. Phonetic confusion rules are automatically extracted from a non-native speech database for a given NL and SL using both NL’s and SL’s ASR systems. These rules are used to modify th...

متن کامل

A Database for the Analysis of Cross-Lingual Pronunciation Variants of European City Names

This paper reports on a speech database that includes non-native pronunciation variants of city names/town names from several European languages. The database is designed as a research tool for the study of pronunciation variants in this specific domain that occur in different groups of non-native speakers. The ongoing data collection currently comprises 20 to 27 native speakers of 3 languages ...

متن کامل

Recognition of non-native German speech with multilingual recognizers

In this study we present di erent approaches to the recognition of non-natives. With a corpus in German spoken by speakers with 56 di erent rst languages, the Strange Corpus, we perform recognition experiments with monolingual and multilingual recognizers. Among other, we compared two German recognizers, one that was trained in addition with non-native (Italian) speech and the other trained wit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017